Ignore benchmark related files and internal benchmarks#74
Open
J535D165 wants to merge 1 commit intoMrTango:mainfrom
Open
Ignore benchmark related files and internal benchmarks#74J535D165 wants to merge 1 commit intoMrTango:mainfrom
J535D165 wants to merge 1 commit intoMrTango:mainfrom
Conversation
shapiromatron
approved these changes
Aug 19, 2025
Collaborator
shapiromatron
left a comment
There was a problem hiding this comment.
Seems like an ok solution, though I wonder if it'd be better to see if a synthetic benchmark dataset could be generated instead of hidden data that can only run by one of our repository collaborators.
| # created from tests | ||
| export.ris | ||
|
|
||
| # extra benchmark data only for internal use (because of copyright) |
Collaborator
There was a problem hiding this comment.
any chance we could create some synthetic data using something like faker? https://github.com/joke2k/faker
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Usually, I'm not a big fan of solutions like this, but given the importance of performance, I think this pragmatic solution can be acceptable. I utilize large, real-world datasets to benchmark the parser's performance and frequently need to switch between branches.
Btw, I'm making nice progress on the PubMed parsing PR, but there are still some open challenges. Performance is one of them.